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ABSTRACT 



The Ohio Off-Grade Proficiency Tests help in monitoring 
students' progress toward Ohio's adapted model courses of study in reading, 
mathematics, citizenship, science, and writing. The purpose of this study was 
to provide item response theory (IRT) parameter estimates and descriptive 
statistics of the scoring categories of the short -response items (SRI) and 
extended response items (ERI) of the Ohio Off -Grade Proficiency 
Test-Mathematics. It also aims to provide information about the interactive 
roles of Ability, Type of Item Response, Gender, and Race, in student 
performance by using both multiple-choice and partial credit scores in 
determining student ability levels. Fifth-graders (n=4830) from a large urban 
area in northeast Ohio participated in the study. Results indicate that 
Gender did not play a significant role whereas with regard to factor Race, 
there were significant differences in student scores. Contains 19 references. 
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Partial Credit Analysis of Mathematics Items 
from the Ohio Off-Grade Proficiency Tests 

The Ohio Off-Grade Proficiency Tests (OOPT) help in monitoring students’ progress 
toward Ohio’s adopted model courses of study in reading, mathematics, citizenship, science, and 
writing. They include multiple-choice, short response, and extended response items (Riverside 
Publishing, 1997). While provided with a variety of descriptive reports, Ohio’s educational 
researchers and decision makers do not have information regarding the role the type of item 
response and factors such as gender, race, and ability level in the OOPT performance of local 
student populations. Such information can help in making decisions about improving students’ 
achievement in urban schools populated by minority children, where the achievement problems 
were seen as being worse (see, e.g., the “Nation at risk” report of the National Commission on 
Excellence in Education, 1983). 

Previous studies have investigated differences in mathematics achievement as they relate 
to single factors such as gender and race, without providing information about their interaction 
with different types of item response and/or students’ ability level (e.g., Friedman, 1989, 1995; 
Benbow & Stanley, 1980; Dossey, Mullis, Lindquist, & Chambers, 1988; Lewis & Hoover, 1986; 
Cooper & Dorr, 1995; Graham, 1995; Losey, 1995; King, 1993). A recent study (DeMars, 1998) 
invesitgated the role of gender, respose format, and abillity in mathematics and science high 
schol proficiency exam, but (a) it did not take into account the factor Race, and (b) it used the 
multiple-choice total score to determine student ability, i.e., it did not take into account the 
partial credit scores of the students in determining their ability scores. 

Purpose of the study 

The purpose of this study is twofold. First , it is to provide IRT parameter estimates and 
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descriptive statistics of the scoring categories of the short-response items (SRI) and extended 
response items (ERI) of the Ohio Off-Grade Proficiency Test-Mathematics (OOPT-M). Second, 
by using both multiple-choice and partial credit scores in determining student’s ability level, to 
provide information about the interactive role of Ability, Type of Item Response, Gender, and 
Race, in students’ performance on the OOPT-M. 

Method 

Instrument 

Used in this study were results from the OOPT-M for grade five (Riverside Publishing, 
1995). It includes 30 multiple-choice items (MCI), 8 short response items (SRI), and 2 extended 
response items (ERI). A dichotomous scale (0, 1) is used for the MCI, a partial credit scale (0, 1, 
2) , for the SRI, and a partial credit scale (0, 1, 2, 3, 4), for the ERI. In learning outcomes, the 
OOPT-M captures (a) patterns, relations, and functions, (b) problem-solving strategies, (c ) 
numbers and number relations, (d) geometry, (e) algebra, (f) measurement, (g) estimation and 
mental computation, and (h) data analysis and probabilities. 

Subjects 

Used in this study were the OOPT-M results of 4830 fifth-graders from a large urban area 
in North-East Ohio. By racial groups, there were 994 White students (477 females and 517 
males), 3242 Black students (1684 females and 1558 males), 348 Hispanic students (159 males 
and 189 females), 38 Asian students (17 females and 21 males), and 208 students with no race 
and/or gender group information. The OOPT-M scores of all 4830 students were used when 
determining the set of ability scores. Because of the small number of Asian students, only three 
racial groups, White, Black, and Hispanic, were included in analyses involving the factor Race. 
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Procedures 

For the purpose of providing IRT parameter estimates of the SRI and ERI of the test, the 
Generalized Partial Credit Model (Muraki, 1992) was used with calculations conducted via the 
computer program PARSCALE (Muraki & Bock, 1996).The Generalized Partial Credit Model 
(GPCM) belongs to the Rasch family of polytomous item response models. It is an extension of 
the partial credit model (PCM), developed by Master (1982), which is appropriate for the 
analysis of items that have more than two successively ordered option categories. The PCM does 
not contain a discriminating power parameter, while the GPCM (Muraki, 1992) does. The 
GPCM is based on the assumption that the probability of selecting the kth category of item j over 
the preceding category, k - 1, is given by the following conditional probability C jk : 

c _ V 9 ) _ expla/9 - V I 

jk ~ P.,_,(9) + P/Q) " 1 + explore - b jk ) I ’ 



where P jk (0) is the probability for a person with ability 0 to select the kth category from mj 
possible categories of item j; (k = 2, 3, ..., mj). 

After solving for the P jk (0) from (1), developed for each k = 2, 3, ..., mj, the result is the GPCM: 

exp[£ a/0 - b jt )] 

P jk m = ( 2 ) 

J c 

£ exp [£ a/9 - bj)] 

C~ 1 /= 1 

The parameters b jk in equation (2) are called item step parameters (ISPs). They are not 
sequentially ordered within item j because b jk represents the relative magnitude of the adjacent 
probabilities Pj k .,(0) and P jk (0). Geometrically, the b jk are the points on the ability scale, 0, at 
which the curves of Pj k .,(0) and P jk (0) intersect. These two curves, referred to as item category 
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response functions (ICRFs), intersect only once, anywhere along the 0 scale. 

The ability scores of the students were calculated using PARSCALE. Students with 
ability scores in the lower 27% were assigned to the low ability group, students with ability 
scores in the upper 27% were assigned to the high ability group, and the rest of the students were 
assigned to the medium ability group. The OOPT-M results were furhter analyzed by gender 
(male, female), race (White, Black, Hispanic), ability (low, medium, high), and type of item 
response (multiple-choice, short response, extended response). All statistical procedures were 
performed using SPSS (SPSS Inc., 1997). 

Results 

Table 1 shows GPCM results for the 10 items of the type SRI and ERI. The item 
numbers are the same as in the OOGPT-M. As the location of each item represents its difficulty, 
one can see, for example, that Item 34 is the most difficult item and Item 4, the easiest item 
among the 10 items. The discriminating power of the items, given by their slopes, varied within a 
relatively small interval, (.36, .79). For example, Item 19 has the highest slope, .79, and, hence, 
it is the best in discriminating students with different scores on the ability scale. For the 
interpretation of the item step information given in Table 1, it should be noted that each “step” in 
a given item is defined here by the partial credit score students may obtain on this item. The PC 
column in Table 1 shows the percent of students assigned to different scoring categories of each 
item. With Item 4, for example, 42.1% of the students were given a score of 0, 28.8%, a score of 
1 , and 29. 1 %, a score of 2. The ISP column shows the item step parameter and the SE column 
shows the standard error of this parameter. Still with Item 4, the ISP of .12 for the score of 1, 
and the ISP of -.12 for the score of 2, show that it has been more difficult for the students to 
make the transition from step 0 to step 1 than the transition from step 1 to step 2. With Item 11, 
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however, the most difficult part for the students was to make the transition from step 1 to step 2 
since the highest ISP, 1.39, in this item corresponds to a score of 2. 

The results from PARSCALE (Muraki &Bock, 1996) showed that the ability scores of 
all 4830 fifth-graders on the OOPT-M were spread between -4.82 and 3.68 on the logit scale. 
Students with ability scores below the 27th percentile (P 27 = -.70) were assigned to the low ability 
group, students with ability scores above the 73rd percentile (P 73 = .37) were assigned to the high 
ability group, and the rest of the students were assigned to the medium ability group. Table 2 
shows the distribution of White, Black, and Hispanic students by gender and ability level. In 
proportional representation, Hispanic females were highest at the low ability level, 31%, Black 
males were highest at the medium ability level, 48%, and White males were highest at the high 
ability level, 41%. Table 3 shows the means and standard deviations of the OOPT-M scores by 
gender and race across the types of item response. Table 4 shows the multivariate test results for 
the OOPT-M score differences by gender, race, and ability. There was a significant main effect 
of factor Ability, which is logical, a significant main effect of factor Race, and a significant 
interaction between Race and Ability. There was no significant main effect of factor Gender and 
no significant interactions between Gender and either of the other two factors. These results show 
that factor Gender does not play any significant role in the OOPT-M scores of the fifth-graders. 

Tables 5, 6, and 7 show the results from a multivariate repeated measures design, with 
one within-subjects factor, Type of item response, and one between-subjects factor, Race. The 
three levels of the within-subjects factor were the average z-scores of the students on the MCI, 
SRI, and ERI, respectively. The rationale for using these scores as repeated measures is that, in 
the OOPT-M, they represent three different measures of the same mathematics ability. In cases 
with a significant main effect of factor Race, post-hoc comparisons were conducted using the 



Dunett’s T3 pairwise comparisons test (see, e.g., SPSS Inc., 1997, p. 37). The results are 
presented by ability levels: 

1 .At low ability level (Table 5), Race was a significant factor for the OOPT-M score on 
the extended response items. The post-hoc comparisons (Table 8) showed a significant difference 
between the White and Hispanic groups, with higher performance of the Hispanic students. No 
significant differences between the three racial groups were found on the multiple-choice and 
short response items. The significance of the factor Type of item response on the within-subjects 
contrast (MCI - SRI) and the graphical representation in Figure 1 indicate that the performance of 
the low ability students increased significantly from multiple-choice to short response items. 
There was no significant change in the average score from short response to extended response 
items. Also, there was no significant interaction between Race and Type of item response. The 
OOPT-M profile of the low ability students, by racial groups and types of item response, is 
presented in Figure 1 . 

2. At medium ability level (Table 6), Race was a significant factor for the OOPT-M scores 
on multiple-choice items. The post-hoc comparisons (Table 8) showed a significant difference 
between the White and Black groups and, also, between the White and Hispanic groups. The 
examination of the 95% confidence intervals shows that the White students performed 
significantly better than both Black and Hispanic students on the multiple-choice items. There 
were no significant differences between the racial groups on the short response and extended 
response items. The factor Type of item response was found to be significant on the within- 
subjects contrasts (MCI - SRI) and (SRI - ERI). This results, with the interpretation of the score 
profiles in Figure 2, indicate that the average score of the medium ability students significantly 
decreased from multiple-choice to short response items and significantly increased from short 



response to extended response items. The significant interaction between Race and Type of item 
response on the within-subjects contrast (SRI - ERI) indicates that the difference between the 
average scores on short response and extended response items varied significantly, in a disordinal 
way (see Figure 2), across the three racial groups. 

3. At high ability level (Table 7), Race was a significant factor for the multiple-choice and 
extended response items. The post-hoc comparisons (Table 8) showed that the White students 
performed better than both Black and Hispanic students on the multiple-choice items and better 
than the Black students on the extended response items. There were no differences between the 
racial groups on the short response items. The significant interaction between Race and Type of 
item response on the within-subjects contrast (MCI - SRI) indicates that the difference between 
the average scores on multiple-choice and short response items varied significantly across the 
racial groups, with a disordinal trend between the Hispanic and Black groups (see Figure 3). 

Discussion 

The results of this study indicate that Gender did not play a significant role in the OOPT- 
M scores for the target population of fifth-graders. In regard to factor Race, there were significant 
differences in the OOPT-M performance of White, Black, and Hispanic students, but they were 
not in favor of a single racial group across different ability levels and types of item response. 

At low ability level, the Hispanic students performed better than the White students on 
the extended response items. No other performance differences between the racial groups were 
found across the three types of items. The lowest performance of the low ability students from all 
racial groups was on the multiple-choice items (see Figure 1). 

At medium ability level, the White students performed better than both Black and 
Hispanic students on the multiple-choice items. There were no other differences between the 
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racial groups across the three types of item response. The performance of all racial groups 
decreased from multiple-choice to short response items and increased from short response to 
extended response items. The difference between students’ performance on short response and 
extended response items varied in a disordinal way across the racial groups (see Figure 2). 

At high ability level, the White students performed better than both Black and Hispanic 
students on the multiple-choice items and better than the Black students on the extended 
response items. There were no other performance differences between the racial groups across 
the three types of item response. The performance of the students from all racial groups increased 
from multiple-choice to short response items and decreased from short response to extended 
response items. The performance difference between Black and Hispanic students varied in a 
disordinal way in the transition from multiple-choice to short response items (see Figure 3). 

Overall, the White students performed best on the multiple-choice items, at medium and 
high ability levels, and on the extended response items at high ability level. The Hispanic 
students performed best on the extended response items at low ability level. At each ability level, 
there were no differences between the racial groups on the short response items. This findings 
suggest that structuring mathematics lessons in ways that relate to the students’ culture and 
problem solving experience can promote the academic achievement of students of different color. 

In conclusion, the results from this study provide information about the psychometric 
characteristics of short response and extended response items of the OOPT-M and information 
about the role gender and race in the performance of students with different ability levels on 
multiple-choice, short response, and extended response items. This information can be useful to 
Ohio’s test analysts and educators in adapting strategies for teaching mathematics and OOPT-M 
training sessions to a diverse body of students in urban schools populated by minority children. 
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Table 1 

Parameter Estimates and Descriptive Statistics of Steps in the Short Response Ttems 
and Extended Response Items of the Ohio Off-Grade Proficiency Test-Mathematics 



Item 


Location 0 


Slope b 


Step 


PC C 


Mean 


S.D. 


ISP d 


SE 








0 


42.1 


12.64 


2.49 


.00 


.00 


4 


.29 (.02) 


.56 (.02) 


1 


28.8 


15.10 


3.28 


. 12 


.04 








2 


29.1 


18.77 


4.12 


-.12 


.04 








0 


88.5 


14.45 


3.50 


.00 


.00 


8 


1.97 (.04) 


.68 (.02) 


1 


5.3 


19.08 


4.10 


-.81 


.06 








2 


5.2 


22.81 


4.25 


.81 


,08 








0 


34.5 


11.99 


1.98 


.00 


.00 








1 


16.2 


13.76 


2.51 


-.13 


.07 


11 


.68 (.02) 


.36 (.01) 


2 


25.4 


16.32 


3.27 


1.39 


.07 








3 


13.0 


18.24 


3.27 


-.83 


.08 








4 


10.9 


20.65 


3.93 


-.43 


.09 








0 


71.0 


13.74 


3.10 


.00 


.00 


15 


1.33 (.04) 


.57 (.02) 


1 


16.7 


16.88 


3.89 


-.25 


.04 








2 


12.3 


20.74 


4.04 


.25 


,06 








0 


80.8 


13.97 


3.17 


.00 


.00 


19 


1.47 (.03) 


.79 (.02) 


1 


10.4 


18.39 


3.54 


-.30 


.04 








2 


8.8 


21.85 


4.09 


.30 


,05 








0 


67.6 


13.35 


2.67 


.00 


.00 


24 


.95 (.02) 


.75 (.02) 


1 


15.3 


17.14 


3.60 


-.26 


.03 








2 


17.2 


20.33 


4.11 


.26 


,04 








0 


85.9 


14.39 


3.56 


.00 


.00 


28 


2.32 (.07) 


.59 (.02) 


1 


11.0 


19.24 


4.49 


-.02 


.05 








2 


3.2 


21.08 


4.17 


.02 


.09 








0 


50.9 


13.45 


3.10 


.00 


.00 








1 


37.9 


15.54 


3.32 


1.71 


.05 


31 


2.05 (.04) 


.40 (.01) 


2 


4.5 


19.20 


4.02 


-1.49 


.11 








3 


5.5 


22.46 


4.01 


1.42 


.13 








4 


1.1 


24.53 


4.04 


-1.63 


,22 








0 


93.5 


14.75 


3.78 


.00 


.00 


34 


2.65 (.08) 


.70 (.03) 


1 


5.0 


19.33 


4.61 


-.25 


.06 








2 


1.4 


25.17 


4.08 


.25 


.12 








0 


56.4 


13.53 


3.18 


.00 


.00 


38 


1.75 (.04) 


.51 (.02) 


1 


38.4 


16.68 


3.97 


1.29 


.04 








2 


5.2 


21.11 


4.68 


-1.29 


.08 



a Given in parentheses is the standard error (SE) of the location (difficulty) of the item; 
b Given in parentheses is the standard error (SE) of the slope of the item; 
c Percent Correct; 
d Item Step Parameters 
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Table 2 

Proportional Distribution of the Students Across Ability Levels by Gender and Race 



Race 



Ability 

Level 


White 


Black 




Hispanic 


Female 
(n = 477) 


Male 
(n = 5 1 7) 


Female 

(n= 1684) (n 


Male 
= 1558) 


Female 
(n = 159) 


Male 
(n = 189) 


Low 


18 


15 


30 


30 


31 


28 


Medium 


44 


44 


47 


48 


43 


45 


High 


38 


41 


24 


22 


26 


27 



Note . The values in the table represent percentages. 
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Table 3 

Means and Standard Deviations of OOPT-M Scores bv Gender. Race, and Type of Ttem 
Response 







Race 






White 


Black 


Hispanic 




Female Male 


Female Male 


Female Male 


Items 


(n = 477) (n = 517) 


(n = 1684) (n = 1558) 


(n = 159) (n = 1 89) 



Multiple-Choice 



M 


16.79 


17.60 


15.24 


15.20 


14.84 


15.26 


SD 


4.55 


4.67 


4.30 


4.27 


4.30 


4.42 


Short Response 


M 


3.47 


3.91 


2.80 


2.57 


2.84 


2.98 


SD 


2.97 


3.20 


2.75 


2.60 


2.94 


2.94 


Extended Response 


M 


2.48 


2.76 


2.05 


2.01 


2.31 


2.28 


SD 


1.92 


2.06 


1.67 


1.75 


1.80 


1.85 



Note. The maximum possible score is 31 for the multiple-choice, 16 for the short response, and 
8 for the extended response items. 
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Table 4 



Multivariate Tests of OOPT-M Score Differences by Gender- 
Race. and Ability Level 



Wilks’ 



Source 


lambda 


df 


df 


F 


Gender (G) 


.99 


3 


4564 


1.71 


Race (R) 


.99 


6 


9128 


6.17** 


Ability (A) 


.34 


6 


9128 


1088.42** 


GxR 


.99 


6 


9128 


1.33 


Gx A 


.99 


6 


9128 


0.80 


Rx A 


.99 


12 


12075 


3.10** 


G x R x A 


.99 


12 


13688 


0.90 



a Hypothesis degrees of freedom. b Error degrees of freedom. 
*£<.05. **£<.01. 
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Table 5 

Multivariate Repeated Measures Analysis on OOPT-M 
Scores for Students at Low Ability Level 



F 



Source df MCI SRI ERI 



Between subjects 

Race (R) 2 1.86 .79 3.06* * 

Error 1231 (0.30) (0.06) (0.18) 



Within subjects contrasts 







MCI - SRI 


SRI - ERI 


Type (T) a 


1 


45.03** 


0.16 


TxR 


2 


2.50 


1.36 


Error 


1234 


(0.37) 


(0.27) 



Note. Values enclosed in parentheses represent mean 
square errors. MCI - SRI is the difference between 
multiple-choice and short response items, and SRI - ERI, 
between short response and extended response items. 
a Type of item response. 

*p< .05. ** p< .01. 




17 



Table 6 

Multivariate Repeated Measures Analysis on OOPT-M 
Scores for Students at Medium Ability Level 
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Source 


df 


_E 


MCI SRI ERI 






Between subjects 


Race (R) 


2 


7.33** 2.76 1.99 


Error 


1648 


(0.38) (0.23) (0.53) 






Within subjects contrasts 






MCI - SRI SRI - ERI 


Type (T) a 


1 


73.80** 35.54** 


TxR 


2 


1.26 3.70** 


Error 


2118 


(0.62) (0.87) 



Note. Values enclosed in parentheses represent mean 
square errors. MCI - SRI is the difference between 
multiple-choice and short response items, and SRI - ERI, 
between short response and extended response items. 
a Type of item response. 

*p < .05. ** p < .01. 




Table 7 

Multivariate Repeated Measures Analysis on OOPT-M 



Scores for Students at High Ability I ,evel 



F 



Source 


_df 


MCI SRI ERI 






Between subjects 


Race (R) 


2 


11.91** 1.84 7.94** 


Error 


1231 


(0.51) (0.78) (0.82) 






Within subjects contrasts 






MCI - SRI SRI - ERI 


Type (T) a 


1 


50.92** 30.71** 


TxR 


2 


3.93* 1.37 


Error 


1234 


(0.80) (1.36) 



Note. Values enclosed in parentheses represent mean 
square errors. MCI - SRI is the difference between 
multiple-choice and short response items, and SRI - ERI, 
between short response and extended response items. 
a Type of item response. 

*p < .05. ** p < .01. 
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Table 8 

Post-Hoc Comparisons for Significant Main Effects of Factor Race 



Type 
of Item 
Response 


Group 1 


Group 2 


Mean Difference a 
(Group 1 - Group2) 


95% Confidence Interval 

Lower Upper 
SE Bound Bound 








Low Ability Level 








ERI 


White 


Hispanic 


0.13* 


0 .05 


0.01 


0.26 








Medium Ability Level 








MCI 


White 


Black 


0.12** 


0.03 


0.04 


0.20 


MCI 


White 


Hispanic 


0.16* 


0.06 


0.03 


0.30 








High Ability Level 








MCI 


White 


Black 


0.20** 


0.04 


0.09 


0.30 


MCI 


White 


Hispanic 


0.29** 


0.08 


0.08 


0.50 


ERI 


White 


Black 


0.22** 


0.06 


0.08 


0.36 



Note. Reported are only significant mean differences captured by the post-hoc tests. 
a In z-scores. 

* p < .05. ** p < .01. 
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Mean (z-scores) 
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Type of Item Response 



Figure 1. OOPT-M performance profiles of low ability students by racial groups 
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Mean (z-scores) 



20 




multiple-choice short response extended response 



Type of Item Response 

Figure 2. OOPT-M performance profiles of medium ability students by racial groups 
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Mean (z-scores) 
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y 




multiple-choice short response extended response 



Type of Item Response 

\ 



Figure 3 . OOPT-M performance profiles of high ability students by racial groups 
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